摘要 :
Gate Sizing is an important step in logic synthesis, where the cells are resized to optimize metrics such as area, timing, power, leakage, etc. In this work, we consider the gate sizing problem for leakage power optimization with ...
展开
Gate Sizing is an important step in logic synthesis, where the cells are resized to optimize metrics such as area, timing, power, leakage, etc. In this work, we consider the gate sizing problem for leakage power optimization with timing constraints. Lagrangian Relaxation is a widely employed optimization method for gate sizing problems. We accelerate Lagrangian Relaxation-based algorithms by narrowing down the range of cells to resize. In particular, we formulate a heterogeneous directed graph to represent the timing graph, propose a heterogeneous graph neural network as the encoder, and train in the way of imitation learning to mimic the selection behavior of each iteration in Lagrangian Relaxation. This network is used to predict the set of cells that need to be changed during the optimization process of Lagrangian Relaxation. Experiments show that our accelerated gate sizer could achieve comparable performance to the baseline with an average of 22.5% runtime reduction.
收起
摘要 :
To increase the utilization of FPGAs in multi-FPGA based systems, time-division multiplexing (TDM) is a widely used technique to accommodate a large number of inter-FPGA signals. However, with this technique, the delay imposed by ...
展开
To increase the utilization of FPGAs in multi-FPGA based systems, time-division multiplexing (TDM) is a widely used technique to accommodate a large number of inter-FPGA signals. However, with this technique, the delay imposed by the inter-FPGA connections becomes significant. Previous research shows that TDM ratio of signals can greatly affect the performance of a system. In this paper, we extend previous problem formulation to meet more general constraints in multi-FPGA based systems and propose a novel approach to solve it. In particular, to optimize system clock period effectively and efficiently, we propose a two-step analytical framework, which first gives a continuous result using a non-linear conjugate gradient-based method and then finalizes the result optimally by a dynamic programming-based discretization algorithm. For comparison, we also solve the problem using an integer linear programming (ILP)-based method. Experimental results show that our approach can improve the system clock period by about 7% on top of a well optimized inter-FPGA routing result. Moreover, our approach scales for designs over 400K nodes while ILP-based method is not able to finish for designs with 2K nodes.
收起
摘要 :
To increase the utilization of FPGAs in multi-FPGA based systems, time-division multiplexing (TDM) is a widely used technique to accommodate a large number of inter-FPGA signals. However, with this technique, the delay imposed by ...
展开
To increase the utilization of FPGAs in multi-FPGA based systems, time-division multiplexing (TDM) is a widely used technique to accommodate a large number of inter-FPGA signals. However, with this technique, the delay imposed by the inter-FPGA connections becomes significant. Previous research shows that TDM ratio of signals can greatly affect the performance of a system. In this paper, we extend previous problem formulation to meet more general constraints in multi-FPGA based systems and propose a novel approach to solve it. In particular, to optimize system clock period effectively and efficiently, we propose a two-step analytical framework, which first gives a continuous result using a non-linear conjugate gradient-based method and then finalizes the result optimally by a dynamic programming-based discretization algorithm. For comparison, we also solve the problem using an integer linear programming (ILP)-based method. Experimental results show that our approach can improve the system clock period by about 7% on top of a well optimized inter-FPGA routing result. Moreover, our approach scales for designs over 400K nodes while ILP-based method is not able to finish for designs with 2K nodes.
收起
摘要 :
To increase the resource utilization in multi-FPGA systems, time-division multiplexing (TDM) is a widely used technique to accommodate a large number of inter-FPGA signals. However, with this technique, the delay imposed by the in...
展开
To increase the resource utilization in multi-FPGA systems, time-division multiplexing (TDM) is a widely used technique to accommodate a large number of inter-FPGA signals. However, with this technique, the delay imposed by the inter-FPGA connections becomes significant. Previous research has shown that the TDM ratios of signals can greatly affect the performance of a system. In this paper, to minimize the system clock period and support more practical constraints in modern multi-FPGA systems, we propose a two-step analytical framework to optimize the TDM ratios of inter-FPGA nets. A Lagrangian relaxation-based method first gives a continuous result under relaxed constraints. A binary search-based discretization algorithm is then used to finalize the TDM ratios such that the resulted maximum displacement is optimal. For comparison, we also solve the problem using linear programming (LP)-based methods, which have guaranteed error bounds to the optimal solutions. Experimental results show that our framework can achieve similar quality with much shorter runtime compared to the LP-based methods. Moreover, our framework scales for designs with over 45000 inter-FPGA nets while the runtime and memory usage of the LP-based methods will increase dramatically as the design scale becomes larger.
收起
摘要 :
As the complexity and scale of circuits keep growing, clocking architectures of FPGAs have become more complex to meet the timing requirement. In this paper, to optimize wirelength and meanwhile meet emerging clocking architectura...
展开
As the complexity and scale of circuits keep growing, clocking architectures of FPGAs have become more complex to meet the timing requirement. In this paper, to optimize wirelength and meanwhile meet emerging clocking architectural constraints, we propose several detailed placement techniques, i.e., two-step clock constraint legalization and chain move. After integrating these techniques into our FPGA placement framework, experimental results on ISPD 2017 benchmarks show that our proposed approach yields 2.3\% shorter routed wirelength and the running time is 2x faster compared to the first place winner in the ISPD 2017 contest. Moreover, we explore the possibilities to use machine learning-based methods to predict routing congestion in UltraScale FPGAs. Experimental results on both ISPD 2016 and ISPD 2017 benchmarks show that our proposed congestion estimation model is a good approximation to the one obtained from Vivado and can lead to good placement results compared to the previous methods.
收起
摘要 :
As the complexity and scale of circuits keep growing, clocking architectures of FPGAs have become more complex to meet the timing requirement. In this paper, to optimize wirelength and meanwhile meet emerging clocking architectura...
展开
As the complexity and scale of circuits keep growing, clocking architectures of FPGAs have become more complex to meet the timing requirement. In this paper, to optimize wirelength and meanwhile meet emerging clocking architectural constraints, we propose several detailed placement techniques, i.e., two-step clock constraint legalization and chain move. After integrating these techniques into our FPGA placement framework, experimental results on ISPD 2017 benchmarks show that our proposed approach yields 2.3% shorter routed wirelength and the running time is 2× faster compared to the first place winner in the ISPD 2017 contest. Moreover, we explore the possibilities to use machine learning-based methods to predict routing congestion in UltraScale FPGAs. Experimental results on both ISPD 2016 and ISPD 2017 benchmarks show that our proposed congestion estimation model is a good approximation to the one obtained from Vivado and can lead to good placement results compared to the previous methods.
收起
摘要 :
As the complexity and scale of circuits keep growing, clocking architectures of FPGAs have become more complex to meet the timing requirement. In this paper, to optimize wirelength and meanwhile meet emerging clocking architectura...
展开
As the complexity and scale of circuits keep growing, clocking architectures of FPGAs have become more complex to meet the timing requirement. In this paper, to optimize wirelength and meanwhile meet emerging clocking architectural constraints, we propose several detailed placement techniques, i.e., two-step clock constraint legalization and chain move. After integrating these techniques into our FPGA placement framework, experimental results on ISPD 2017 benchmarks show that our proposed approach yields 2.3\% shorter routed wirelength and the running time is 2x faster compared to the first place winner in the ISPD 2017 contest. Moreover, we explore the possibilities to use machine learning-based methods to predict routing congestion in UltraScale FPGAs. Experimental results on both ISPD 2016 and ISPD 2017 benchmarks show that our proposed congestion estimation model is a good approximation to the one obtained from Vivado and can lead to good placement results compared to the previous methods.
收起
摘要 :
As the complexity and scale of FPGA circuits grows, resolving routing congestion becomes more important in FPGA placement. In this paper, we propose a routability-driven placement algorithm for large-scale heterogeneous FPGAs. Our...
展开
As the complexity and scale of FPGA circuits grows, resolving routing congestion becomes more important in FPGA placement. In this paper, we propose a routability-driven placement algorithm for large-scale heterogeneous FPGAs. Our proposed algorithm consists of (1) partitioning, (2) packing, (3) global placement with congestion estimation, (4) window-base legalization, and (5) routing resource-aware detailed placement. Experimental results show that our proposed approach can give routable placement results for all the benchmarks in the ISPD2016 contest and can achieve good result compared to the other wining teams of the ISPD2016 contest.
收起
摘要 :
As the complexity and scale of FPGA circuits grows, resolving routing congestion becomes more important in FPGA placement. In this paper, we propose a routability-driven placement algorithm for large-scale heterogeneous FPGAs. Our...
展开
As the complexity and scale of FPGA circuits grows, resolving routing congestion becomes more important in FPGA placement. In this paper, we propose a routability-driven placement algorithm for large-scale heterogeneous FPGAs. Our proposed algorithm consists of (1) partitioning, (2) packing, (3) global placement with congestion estimation, (4) window-base legalization, and (5) routing resource-aware detailed placement. Experimental results show that our proposed approach can give routable placement results for all the benchmarks in the ISPD2016 contest and can achieve good result compared to the other wining teams of the ISPD2016 contest.
收起
摘要 :
In multi-FPGA systems, time-division multiplexing (TDM) is a widely used technique to transfer signals between FPGAs. While TDM can greatly increase logic utilization, the inter-FPGA delay will also become longer. A good time-mult...
展开
In multi-FPGA systems, time-division multiplexing (TDM) is a widely used technique to transfer signals between FPGAs. While TDM can greatly increase logic utilization, the inter-FPGA delay will also become longer. A good time-multiplexing scheme for inter-FPGA signals is very important for optimizing the system performance. In this work, we propose a fast algorithm to generate high quality time-multiplexed routing results for multiple FPGA systems. A hybrid routing algorithm is proposed to route the nets between FPGAs, by maze routing and by a fast minimum terminal spanning tree method. After obtaining a routing topology, a two-step method is applied to perform TDM assignment to optimize timing, which includes an initial assignment and a competitive-based refinement. Experiments show that our system-level routing and TDM assignment algorithm can outperform both the top winner of the ICCAD 2019 Contest and the state-of-the-art methods. Moreover, compared to the state-of-the-art works [17], [22], our approach has better run time by more than 2X with better or comparable TDM performance.
收起